Optimization of out-of-core data preparation methods identifying runtime determining factors

نویسندگان

  • Tamás Schrádi
  • Ákos Dudás
چکیده

In the data preparation phase of a data mining task the raw, fine granulated data has to be transformed according to analytical aims into a more compact form in order to represent data at a higher abstraction level suitable for machine processing and human understanding as well. Vast datasets require sophisticated, out-of-core methods, which are prepared to handle these datasets using external storages during their execution. In this paper we investigate different pre-processing approaches to overcome the limitation of the size of the main memory from theoretical and practical points of view. We propose possible alternatives for different processing scenarios. Both of the proposed out-of-core algorithms are capable of processing datasets which are by orders of magnitude larger than the main memory; all this is done in a faulttolerant way and even on an average PC.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Flow Units Using an Artificial Neural Network Approach Optimized by the Imperialist Competitive Algorithm

The spatial distribution of petrophysical properties within the reservoirs is one of the most important factors in reservoir characterization. Flow units are the continuous body over a specific reservoir volume within which the geological and petrophysical properties are the same. Accordingly, an accurate prediction of flow units is a major task to achieve a reliable petrophysical description o...

متن کامل

Parallelization of Irregular Codes Including Out-of-Core Data and Index Arrays

This paper describes techniques for implementing irregular out-of-core codes on distributed memory machines. These codes involve data arrays and other data structures that are too large to t in main memory; so data needs to be stored on disks and fetched during the execution of the program. The eecient use of disk storage is a critical factor that determines the performance of these application...

متن کامل

Maltodextrine nanoparticles loaded with polyphenolic extract from apple industrial waste: preparation, optimization and characterization

The main aim of this study was to prepare apple pomace polyphenolic extract (APPE- referred to as a core) loaded into biodegradable and commercially available natural polymer such as maltodextrin (MD-referred to as a shell). The polymer coating potentially improves its low stability and bioavailability and also directs the control release of the encapsulated material. The MD-nanoparticles (NPs)...

متن کامل

On the Comparison of the Applications of Conventional Ranking Techniques in Determining the Priority Factors Affecting Seed Production of Medicinal Plants: Case of Guilan Province, Iran

Aimed at identifying and prioritizing promoters and deterrent affecting seed production of medicinal plants, the present study was conducted in 2014. The Delphi method was conducted using a panel of 13 experts in Guilan Natural Resources and Agriculture Organization. In the first round of the study, multiple-response techniques were used for content analysis. Based on the results of the first r...

متن کامل

Design, Optimization Process and Efficient Analysis for Preparation of Copolymer-Coated Superparamagnetic Nanoparticles

Magnetic nanoparticles (MNPs) are very important systems with potential use in drug delivery systems, ferrofluids, and effluent treatment. In many situations, such as in biomedical applications, it is necessary to cover inorganic magnetic particles with an organic material, such as polymers. A superparamagnetic nanocomposite Fe3O4/poly(maleic anhydride-co-acrylic acid) P(MAH-co-AA) with a core/...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011